Skip to content

feat: add cpptrace for better stack traces #4042

Draft
kdrienCG wants to merge 15 commits intodevelopfrom
feature/kdrienCG/betterStackTrace
Draft

feat: add cpptrace for better stack traces #4042
kdrienCG wants to merge 15 commits intodevelopfrom
feature/kdrienCG/betterStackTrace

Conversation

@kdrienCG
Copy link
Copy Markdown

@kdrienCG kdrienCG commented Apr 28, 2026

(Requires TPL PR #347)

This PR aims to improve the stack trace output when an error is thrown.

When GEOS crashes from a C++ exception, the stack trace we log is not really useful: it points at main.cpp instead of the code that actually threw, missing on valuable informations for devs. This is not a bug in the error handler, it is how the C++ runtime works.
cpptrace solves this by providing a mechanism to get stack traces from thrown exceptions, that will make us able to know exactly where the code went.

Here is an example defining a myThrowingFunction() incorrectly using a map directly (written in the main.cpp for convenience)

Without cpptrace:

***** Exception
***** Rank 0
***** Message :
map::at

***** StackTrace of 7 frames
  - Frame  0:  geos::ErrorLogger::initCurrentExceptionMessage(geos::MsgType, std::basic_string_view<char, std::char_traits<char> >, int) 
  - Frame  1:  main /somewhere/in/my/disk/geos/GEOS/src/main/main.cpp:272 (discriminator 3)
  - Frame  2:  __libc_start_main 
  - Frame  3:  _start 
  - Frame  4:  main /somewhere/in/my/disk/geos/GEOS/src/main/main.cpp:275 (discriminator 2)
  - Frame  5:  __libc_start_main 
  - Frame  6:  _start 
*****

With cpptrace:
(truncated to fit the PR description)

***** Exception
***** Rank 0
***** Message :
map::at

***** StackTrace of 15 frames
  - Frame  0: at libstdc++.so.6
  - Frame  1: in __gxx_personality_v0 at libstdc++.so.6
  - Frame  2: at libstdc++.so.6
  - Frame  3: in std::map<std::string, int, std::less<std::string>>::at(std::string const&) at stl_map.h:551:24
  - Frame  4: in myThrowingFunction() at main.cpp:40:24
  - Frame  5: in operator() at main.cpp:49:16
  - Frame  6: in do_try_catch<const geos::NotAnError&, main(int, char**)::<lambda()>&, main(int, char**)::<lambda...
  - Frame  7: in operator() at from_current.hpp:143:32
  - Frame  8: in operator() at from_current.hpp:143:32
  - Frame  9: in operator() at from_current.hpp:143:32
  - Frame 10: in try_catch_impl<main(int, char**)::<lambda()>&, main(int, char**)::<lambda(const geos::NotAnError...
  - Frame 11: in try_catch<main(int, char**)::<lambda()>&, main(int, char**)::<lambda(const geos::NotAnError&)>&...
  - Frame 12: in main at main.cpp:129:22
  - Frame 13: in __libc_start_main at libc.so.6
  - Frame 14: in _start at geosx
*****

We have a detailled stack trace with the name of the function that throws and the correct line and character position.

@kdrienCG kdrienCG self-assigned this Apr 28, 2026
@kdrienCG kdrienCG added type: feature New feature or request flag: requires updated TPL(s) Needs a specific TPL PR labels Apr 28, 2026
@kdrienCG kdrienCG marked this pull request as ready for review May 5, 2026 16:50
@rrsettgast
Copy link
Copy Markdown
Member

@kdrienCG Is there a discussion somewhere about whether we want to add this dependency?

@kdrienCG kdrienCG marked this pull request as draft May 6, 2026 09:31
@herve-gross
Copy link
Copy Markdown
Contributor

@rrsettgast yes, we discussed this internally. There is value in having this TPL for debugging deeper issues. We can talk if you wish.

@herve-gross herve-gross requested review from bd713 and jafranc May 6, 2026 16:14
@rrsettgast
Copy link
Copy Markdown
Member

rrsettgast commented May 6, 2026

@rrsettgast yes, we discussed this internally. There is value in having this TPL for debugging deeper issues. We can talk if you wish.

@herve-gross That decision being made without consulting others on the core team is inappropriate. The inclusion of this package covers the fundamental problem in the approach to c++ exceptions in an HPC simulation code. There are simpler solutions...like output stack on the throw.

Comment on lines +219 to +226
while( std::getline( iss, stackLine ) )
{
if( std::regex_search( stackLine, pattern ))
std::smatch m;
if( std::regex_search( stackLine, m, lvArrayPattern ) ||
std::regex_search( stackLine, m, cpptracePattern ) )
{
m_errorMsg.m_isValidStackTrace = true;
index = stackLine.find( ':' );
m_errorMsg.m_sourceCallStack.push_back( stackLine.substr( index + 1 ) );
m_errorMsg.m_sourceCallStack.push_back( stackLine.substr( m.position() + m.length() ) );
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while( std::getline( iss, stackLine ) )
{
if( std::regex_search( stackLine, pattern ))
std::smatch m;
if( std::regex_search( stackLine, m, lvArrayPattern ) ||
std::regex_search( stackLine, m, cpptracePattern ) )
{
m_errorMsg.m_isValidStackTrace = true;
index = stackLine.find( ':' );
m_errorMsg.m_sourceCallStack.push_back( stackLine.substr( index + 1 ) );
m_errorMsg.m_sourceCallStack.push_back( stackLine.substr( m.position() + m.length() ) );
while( std::getline( iss, stackLine ) )
{
std::smatch m;
if( std::regex_search( stackLine, m, lvArrayPattern ) ||
std::regex_search( stackLine, m, cpptracePattern ) )
{
m_errorMsg.m_isValidStackTrace = true;
m_errorMsg.m_sourceCallStack.push_back( m.suffix().str() );
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be tested but might be interesting to leverage std::smatch to its max

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified in commit e4334c2

* @note Not signal-safe. Use signalSafeStackTrace() from inside a signal handler.
*/
static std::string stackTrace();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/**
* @brief Get a stack trace from a context where signal-safety is required.
* @return The stack trace as a string.
* @note If signal-safe is required, revert back to LvArray::system
*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth documenting here that behavior for signal-safe is dropping cpptrace

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in commit 5cec03

Comment thread src/main/main.cpp
Comment on lines -80 to +87
catch( geos::Exception & e )
};

auto onGeosException = [&]( geos::Exception & e )
{ // GEOS generated exceptions management
#ifdef GEOS_USE_CPPTRACE
std::string const stacktrace = StackTrace::formatStackTrace( cpptrace::from_current_exception() );
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the MACRO dispach will occur at each new lambda-exception. Maybe worth passing a callback through the lambda's interface and dispatch all below ? In case we add more of this and the duplication spread

Comment thread src/main/main.cpp
int main( int argc, char *argv[] )
{
try
auto runMain = [&]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[] or [&] ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[&] to capture argc and argv[]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

flag: requires updated TPL(s) Needs a specific TPL PR type: feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants